Main Method Families & Architectural Paradigms

Cross-cutting model classes, architectural families and meta-techniques that appear across multiple learning paradigms (supervised / unsupervised / self-supervised / reinforcement / semi-supervised).

Linear / Parametric Family (classical, interpretable, fast baseline)

Linear Models / Generalized Linear Models (GLMs) – Family of models where prediction is a linear combination of features (weighted sum + bias) possibly transformed by a link function; includes ordinary linear regression, logistic regression, ridge/lasso/elastic net regularization, Poisson/gamma regression, etc.; very interpretable coefficients, fast training/inference, strong baseline on many problems; assumes linear relationships (or linear after link); foundation for many extensions (e.g. single-layer perceptron → neural nets); scikit-learn groups them under linear_model module.

Neural / Connectionist Family (dominant general-purpose paradigm)

Neural Networks – Universal function approximators built from layered artificial neurons with learnable weights and non-linear activations; trained via backpropagation and gradient descent; capable of hierarchical feature learning; backbone of almost all state-of-the-art performance on high-dimensional / perceptual data (images, audio, text, video, multimodal); includes shallow MLPs, deep architectures and virtually all modern specialized nets.
Deep Learning – Multi-layer neural networks (typically >3–5 hidden layers); the scalable, data-hungry, compute-intensive subset of neural networks that became dominant after 2012; automatic representation learning at multiple levels of abstraction; currently the most powerful and most widely deployed family across nearly every ML domain.
Transformers – Attention-based architecture introduced in 2017 (“Attention Is All You Need”); replaces recurrence with parallelizable self-attention + feed-forward layers + positional encodings; foundation of all large language models (LLMs), vision transformers (ViT), multimodal models, time-series transformers, protein models, etc.; enables extremely long context, transfer learning and scaling laws; currently the most successful single architecture family in AI.

Tree-based Family (classical, very strong on structured/tabular data)

Decision Trees – Recursive binary (or multi-way) partitioning of feature space via greedy impurity/variance reduction; highly interpretable, handle mixed types, invariant to monotonic transformations; base learner for most ensembles; suffer from high variance and overfitting without constraints.
Random Forest – Ensemble of independently trained decision trees via bootstrap aggregating (bagging) + random feature subspace at each split; excellent out-of-the-box performance, robust to overfitting, feature importance estimation; still among the strongest classical methods on tabular data.
Gradient Boosting Machines – Sequential ensemble where each new tree corrects the residual errors of the previous ones (additive gradient descent in function space); state-of-the-art on tabular competitions for many years; implementations: XGBoost, LightGBM, CatBoost (ordered boosting, categorical handling, GPU support).

Kernel / Margin-based Family

Support Vector Machines / Kernel Methods – Maximum-margin linear separator in original or kernel-induced high-dimensional space; kernel trick allows implicit non-linear mappings (RBF, polynomial, sigmoid…); very effective in high-dimensional but low-sample regimes; still competitive on small/medium structured data; SVM → SVR, kernel PCA, kernel ridge, etc..

Probabilistic / Bayesian Family

Gaussian Processes – Non-parametric probabilistic models that place a distribution over functions; provide full posterior predictive distributions (excellent uncertainty quantification); computationally expensive (O(n³)); very strong on small-to-medium data when uncertainty matters (Bayesian optimization, active learning, small-data regression).
Naive Bayes & related probabilistic classifiers – Conditional independence assumption + Bayes theorem; extremely fast, works surprisingly well on text / high-dimensional sparse data even when independence is violated; variants: Gaussian NB, Multinomial NB, Bernoulli NB, Complement NB.

Instance-based / Non-parametric Family

k-Nearest Neighbors – No training phase — prediction via majority vote / averaging of k closest training examples in feature space; distance metric is critical (Euclidean, Manhattan, Mahalanobis, learned metrics…); simple, powerful on small data, suffers from curse of dimensionality and high inference cost.

Generative / Density-estimation Family

Variational Autoencoders (VAEs) – Latent-variable generative model trained with amortized variational inference + reconstruction loss + KL divergence; smooth latent space, good for disentanglement, semi-supervised learning, generative modeling.
Generative Adversarial Networks (GANs) – Two-player minimax game: generator vs discriminator; produces very sharp samples; notoriously unstable training; many variants (DCGAN, StyleGAN, BigGAN, diffusion beats most GANs in 2024–2026).
Diffusion Models – Iterative denoising process that learns to reverse a forward noise-adding Markov chain; currently state-of-the-art in image/video generation (Stable Diffusion, DALL·E 3, Sora-like models), strong likelihood estimation, classifier-free guidance.

Meta-techniques / Wrappers

Ensemble Methods – Bagging, boosting, stacking, voting, blending…; combine diverse base learners to reduce variance/bias; almost always improves performance; Random Forest & GBM are the most famous special cases.